An Empirical Comparison of Contemporary Unsupervised Approaches for Extractive Speech Summarization

نویسندگان

  • Shih-Hung Liu
  • Kuan-Yu Chen
  • Kai-Wun Shih
  • Berlin Chen
  • Hsin-Min Wang
  • Wen-Lian Hsu
چکیده

Due to the rapid-developed Internet and with the big data era coming, the automatic summarization research has been emerged a popular research topic. The aim of automatic summarization is in attempt to select important text or spoken sentence to represent the topic (theme) of original text or spoken document according to a predefined summarization ratio. In this study we frame automatic summarizaiton task as an ad-hoc information retrieval (IR) problem and employ the mathematical sound language modeling (LM) framework for extractive speech summarization, which can perform important sentence selection in an unsupervised manner and has shown its preliminary success. The main contribution of this paper is three-fold. First, by the virtue of relevance modeling, we explore several effective sentence modeling formulations to enhance the sentence models involved in the LM-based summarization framework and the first use of tri-mixture model to improve the performance of extractive speech summarization. Second, since the language modeling will suffer from data sparseness problem and the common solution is to adopt smoothing techniques, in this research we investigate three different smoothing approaches to evaluate how they influence the summarization performance. Third, we further apply the well-studied ranking model (BM25) and also its variants in IR community for ranking important sentence in extractive speech summarization. Experiments conducted on public avaiable dataset (MATBN) and the results show that our applied methods have effective summarization performance when compared to the other well-practiced and state-of-the-art unsupervised methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Risk Minimization Framework for Extractive Speech Summarization

In this paper, we formulate extractive summarization as a risk minimization problem and propose a unified probabilistic framework that naturally combines supervised and unsupervised summarization models to inherit their individual merits as well as to overcome their inherent limitations. In addition, the introduction of various loss functions also provides the summarization framework with a fle...

متن کامل

Intonational phrases for speech summarization

Extractive speech summarization approaches select relevant segments of spoken documents and concatenate them to generate a summary. The extraction unit chosen, whether a sentence, syntactic constituent, or other segment, has a significant impact on the overall quality and fluency of the summary. Even though sentences tend to be the choice of most the extractive speech summarizers, in this paper...

متن کامل

Positional language modeling for extractive broadcast news speech summarization

Extractive summarization, with the intention of automatically selecting a set of representative sentences from a text (or spoken) document so as to concisely express the most important theme of the document, has been an active area of experimentation and development. A recent trend of research is to employ the language modeling (LM) approach for important sentence selection, which has proven to...

متن کامل

Long story short - Global unsupervised models for keyphrase based meeting summarization

We analyze and compare two different methods for unsupervised extractive spontaneous speech summarization in the meeting domain. Based on utterance comparison, we introduce an optimal formulation for the widely used greedy maximum marginal relevance (MMR) algorithm. Following the idea that information is spread over the utterances in form of concepts, we describe a system which finds an optimal...

متن کامل

Hybrids of supervised and unsupervised models for extractive speech summarization

Speech summarization, distilling important information and removing redundant and incorrect information from spoken documents, has become an active area of intensive research in the recent past. In this paper, we consider hybrids of supervised and unsupervised models for extractive speech summarization. Moreover, we investigate the use of the unsupervised summarizer to improve the performance o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2017